Contextually-Mediated Semantic Similarity Graphs for Topic Segmentation
نویسندگان
چکیده
We present a representation of documents as directed, weighted graphs, modeling the range of influence of terms within the document as well as contextually determined semantic relatedness among terms. We then show the usefulness of this kind of representation in topic segmentation. Our boundary detection algorithm uses this graph to determine topical coherence and potential topic shifts, and does not require labeled data or training of parameters. We show that this method yields improved results on both concatenated pseudo-documents and on closed-captions for television programs.
منابع مشابه
Diachronic semantic cohesion for topic segmentation of TV broadcast news
This paper proposes a new way to integrate semantic relations into a topic segmentation process by defining the notion of semantic cohesion. In the context of a sliding window based automatic topic segmentation algorithm, semantic relations are incorporated in the similarity measure between adjacent blocs. Additionaly, in the context of TV Brodcast News topic segmentation, we propose a new prot...
متن کاملSimilarity for Natural Semantic Networks
A natural semantic network (NSN) represents the knowledge of a group of persons with respect to a particular topic. NSN comparison would allow to discover how close one group is to the other in terms of expertise in the topic— for example, how close apprentices are to experts or students to teachers. We propose to conceive natural semantic networks as weighted bipartite graphs and to extract fe...
متن کاملAn Orthonormal Basis for Topic Segmentation in Tutorial Dialogue
This paper explores the segmentation of tutorial dialogue into cohesive topics. A latent semantic space was created using conversations from human to human tutoring transcripts, allowing cohesion between utterances to be measured using vector similarity. Previous cohesionbased segmentation methods that focus on expository monologue are reapplied to these dialogues to create benchmarks for perfo...
متن کاملTranscript Segmentation Using Utterance Cosine Similarity Measure
One of the problems addressed by the Tracker project is the extraction of the key issues discussed at meetings through the analysis of transcripts. Whilst the task of topic extraction is an easy task for humans it has proven difficult task to automate given the unstructured nature of our transcripts. This paper proposes a new approach to transcript segmentation based on the Utterance Cosine Sim...
متن کاملAutomatic Hashtag Recommendation in Social Networking and Microblogging Platforms Using a Knowledge-Intensive Content-based Approach
In social networking/microblogging environments, #tag is often used for categorizing messages and marking their key points. Also, since some social networks such as twitter apply restrictions on the number of characters in messages, #tags can serve as a useful tool for helping users express their messages. In this paper, a new knowledge-intensive content-based #tag recommendation system is intr...
متن کامل